Goto

Collaborating Authors

 normalized loss


Put CASH on Bandits: A Max K-Armed Problem for Automated Machine Learning

Balef, Amir Rezaei, Vernade, Claire, Eggensperger, Katharina

arXiv.org Artificial Intelligence

The Combined Algorithm Selection and Hyperparameter optimization (CASH) is a challenging resource allocation problem in the field of AutoML. We propose MaxUCB, a max k-armed bandit method to trade off exploring different model classes and conducting hyperparameter optimization. MaxUCB is specifically designed for the light-tailed and bounded reward distributions arising in this setting and, thus, provides an efficient alternative compared to classic max k-armed bandit methods assuming heavy-tailed reward distributions. We theoretically and empirically evaluate our method on four standard AutoML benchmarks, demonstrating superior performance over prior approaches. We make our code and data available at https://github.com/amirbalef/CASH_with_Bandits


Bias of Stochastic Gradient Descent or the Architecture: Disentangling the Effects of Overparameterization of Neural Networks

Peleg, Amit, Hein, Matthias

arXiv.org Artificial Intelligence

The implicit bias Neural networks typically generalize well when of Stochastic Gradient Descent (SGD) is thus often thought fitting the data perfectly, even though they are to be the main reason behind generalization (Arora et al., heavily overparameterized. Many factors have 2019; Shah et al., 2020). A recent thought-provoking study been pointed out as the reason for this phenomenon, by Chiang et al. (2023) suggests the idea of the volume hypothesis including an implicit bias of stochastic for generalization: well-generalizing basins of the gradient descent (SGD) and a possible simplicity loss occupy a significantly larger volume in the weight space bias arising from the neural network architecture. of neural networks than basins that do not generalize well. The goal of this paper is to disentangle the factors They argue that the generalization performance of neural that influence generalization stemming from optimization networks is primarily a bias of the architecture and that the and architectural choices by studying implicit bias of SGD is only a secondary effect. To this end, random and SGD-optimized networks that achieve they randomly sample networks that achieve zero training zero training error. We experimentally show, in error (which they term Guess and Check (G&C)) and argue the low sample regime, that overparameterization that the generalization performance of these networks is in terms of increasing width is beneficial for generalization, qualitatively similar to networks found by SGD. and this benefit is due to the bias of SGD and not due to an architectural bias. In contrast, In this work, we revisit the approach of Chiang et al. (2023) for increasing depth, overparameterization and study it in detail to disentangle the effects of implicit is detrimental for generalization, but random and bias of SGD from a potential bias of the choice of architecture. SGD-optimized networks behave similarly, so this As we have to compare to randomly sampled neural can be attributed to an architectural bias.


Measuring Feature Sparsity in Language Models

Deng, Mingyang, Tao, Lucas, Benton, Joe

arXiv.org Artificial Intelligence

Recent works have proposed that activations in language models can be modelled as sparse linear combinations of vectors corresponding to features of input text. Under this assumption, these works aimed to reconstruct feature directions using sparse coding. We develop metrics to assess the success of these sparse coding techniques and test the validity of the linearity and sparsity assumptions. We show our metrics can predict the level of sparsity on synthetic sparse linear activations, and can distinguish between sparse linear data and several other distributions. We use our metrics to measure levels of sparsity in several language models. We find evidence that language model activations can be accurately modelled by sparse linear combinations of features, significantly more so than control datasets. We also show that model activations appear to be sparsest in the first and final layers.


Azure Machine Learning - Automobile Price Prediction Tutorial

#artificialintelligence

In this article, we'll go through a hands-on experience to build a machine learning model to predict price of automobiles. The previous article explored about Azure Machine Learning and we went through a step-by-step process to create Machine Learning Workspace in Azure, creating the compute instances and compute cluster. This article builds up to the last article – designing a full-on machine learning project. We explore on developing a machine learning model for Automobile Price Prediction in this article. The machine learning workflow are explained and discussed in detail in the process.